Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

ERU-KG: Efficient Reference-aligned Unsupervised Keyphrase Generation

Add code
May 30, 2025
Viaarxiv icon

Business as Rulesual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs

Add code
May 29, 2025
Viaarxiv icon

PreP-OCR: A Complete Pipeline for Document Image Restoration and Enhanced OCR Accuracy

Add code
May 28, 2025
Viaarxiv icon

StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs

Add code
May 29, 2025
Viaarxiv icon

AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora

Add code
May 29, 2025
Viaarxiv icon

Named Entity Recognition in Historical Italian: The Case of Giacomo Leopardi's Zibaldone

Add code
May 26, 2025
Viaarxiv icon

Business as \textit{Rule}sual: A Benchmark and Framework for Business Rule Flow Modeling with LLMs

Add code
May 24, 2025
Viaarxiv icon

Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance

Add code
May 25, 2025
Viaarxiv icon

LLM-Based Compact Reranking with Document Features for Scientific Retrieval

Add code
May 19, 2025
Viaarxiv icon

Low-Resource Language Processing: An OCR-Driven Summarization and Translation Pipeline

Add code
May 16, 2025
Viaarxiv icon